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Abstract 

We  propose  two  novel  minimal  solvers  which  advance 
the  state  of  the  art  in  satellite  imagery  processing.  Our 
methods  are  efficient  and  do  not  rely  on  the  prior  existence 
of  complex  inverse  mapping  functions  to  correlate  2D  im¬ 
age  coordinates  and  3D  terrain.  Our  first  solver  improves 
on  the  stereo  correspondence  problem  for  satellite  imagery, 
in  that  we  provide  an  exact  image-to-object  space  map¬ 
ping  (where  prior  methods  were  inaccurate).  Our  second 
solver  provides  a  novel  mechanism  for  3D  point  triangu¬ 
lation,  which  has  improved  robustness  and  accuracy  over 
prior  techniques.  Given  the  usefulness  and  ubiquity  of  satel¬ 
lite  imagery,  our  proposed  methods  allow  for  improved  re¬ 
sults  in  a  variety  of  existing  and  future  applications. 

1.  Introduction 

Commercial  satellite  imagery,  captured  on  a  variety  of 
hardware  platforms,  has  become  widely  available  by  multi¬ 
ple  vendors.  It  is  a  valuable  source  of  information  and  has 
received  much  attention  by  the  computer  vision  community. 
For  instance,  it  has  driven  work  on  geo-localization  l20l. 
meteorological  and  oceanographic  forecasts  ED,  urban 
change  detection  11241.  and  road  detection  (25). 

To  effectively  leverage  this  imagery,  the  relationship  be¬ 
tween  the  underlying  3D  scene  geometry  and  its  2D  image 
has  to  be  known.  This  is  typically  provided  by  a  camera 
model.  However,  defining  a  precise  physical  camera  model 
is  complicated  due  to  the  complex  hardware  configuration 
and  the  capture  process.  It  is  standard  practice  to  construct 
an  approximation  to  the  mapping  between  the  3D  object 
space  and  the  image. 

The  representation  that  has  been  used  to  approximate 
this  mapping  for  over  two  decades  is  the  Rational  Poly¬ 
nomial  Coefficient  (RPC)  model.  It  was  first  introduced 
by  Hartley  and  Saxena  lfl4l  and  greatly  contributed  to  the 
thriving  of  satellite  imagery  applications.  The  primary  rea¬ 
son  for  its  success  is  its  ability  to  maintain  the  full  accuracy 
of  physical  sensor  models,  its  unique  characteristic  of  sen¬ 
sor  independence,  and  its  real-time  calculation  fl8l.  It  is 


included  in  the  standards  of  the  NITF  image  format  m  ,  and 
is  available  from  most  satellite  image  vendors  El- 

However,  one  challenge  faced  by  many  works  utilizing 
satellite  imagery  and  the  RPC  model  is  that  the  bidirec¬ 
tional  mapping  between  the  3D  object  and  2D  image  spaces 
may  be  unknown.  Specifically,  vendors  supplying  the  satel¬ 
lite  imagery  may  only  provide  the  one-directional  mapping 
function  from  the  3D  object  space  to  2D  image  (known  as 
a  forward  RPC  model).  The  inverse  mapping,  termed  the 
inverse  RPC  model,  utilizes  2D  image  coordinates  and  an 
associated  altitude  in  order  to  infer  the  resulting  3D  coordi¬ 
nate.  Approaches  attempting  to  solve  for  this  inverse  map¬ 
ping  often  lack  accuracy  ED-  In  this  work,  we  propose  two 
novel  high-accuracy  minimal  solvers  (see  Fig.  [TJ  for  use 
with  satellite  imagery  only  requiring  the  forward  RPC. 

The  first  minimal  solver  for  the  inverse  mapping  targets 
the  domain  of  extracting  depth  information  from  pairs  of 
satellite  images  (stereo).  Previous  works  have  explored  this 
domain  on  Earns,  but  suffer  from  various  limitations. 
For  instance,  a  typical  approach  relying  on  volumetric  mod¬ 
els  can  have  high  memory  requirements  0.  Furthermore, 
recent  image-to-image  approaches  rely  on  the  existence  of 
an  inverse  RPC  model,  which  may  be  missing  or  inaccu¬ 
rate  dang  ED-  Our  approach  avoids  these  issues  by 
proposing  an  efficient  minimal  solver,  which  bypasses  the 
issue  of  explicitly  constructing  a  single  inverse  RPC  model 
for  the  entire  image.  Instead,  each  coordinate  is  processed 
independently,  providing  a  highly-accurate  inverse  mapping 
conforming  to  the  provided  forward  RPC  model. 
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The  second  novel  minimal  solver  targets  the  application 
of  3D  point  triangulation.  In  typical  computer  vision  ap¬ 
plications,  3D  point  triangulation  is  the  task  of  finding  the 
optimal  3D  point  position  given  correspondences  among 
feature  points  in  calibrated  images.  However,  triangula¬ 
tion  under  the  forward  RPC  model  is  much  more  difficult, 
mainly  due  to  the  increased  complexity  of  the  camera  model 
l29l.  Moreover,  since  no  explicit  epipolar  constraints  exist 
for  the  RPC  camera  model,  feature  mismatches  cannot  be 
easily  detected  and  removed.  Our  approach  solves  these  is¬ 
sues  by  first  proposing  an  effective  strategy  for  a  minimal 
solver  based  triangulation  under  the  forward  RPC  model, 
and  then  by  making  the  observation  that  only  three  of  the 
four  coordinates  in  a  correspondence  are  needed  to  solve 
the  equations.  The  fourth  coordinate  can  then  be  used  both 
for  robustness  (enabling  us  to  reject  false  matches),  as  well 
as  for  refinement  (leveraging  redundant  information).  This 
is  novel  compared  to  typical  triangulation  schemes,  and  we 
demonstrate  the  usefulness  and  effectiveness  of  this  strategy 
in  our  results. 

Given  the  usefulness  and  ubiquity  of  satellite  imagery 
and  the  RPC  model,  our  proposed  methods  allow  for  im¬ 
proved  results  in  a  variety  of  existing  and  future  applica¬ 
tions.  For  low-level  problems  and  stereo  computations,  our 
exact  image-to-object  space  mapping  increases  the  accuracy 
of  3D  reconstructions.  For  high-level  tasks,  we  provide  im¬ 
proved  feature  matching  and  outlier  rejection  mechanisms 
to  improve  the  results. 

2.  Related  Work 

Most  applications  leveraging  satellite  imagery  for  recon¬ 
struction  and  analysis  require  a  known  camera  model  in  or¬ 
der  to  correctly  function.  A  popular  technique  to  provide 
the  required  camera  calibration  is  the  RPC  model  Ifl4l.  Af¬ 
ter  its  introduction,  Tao  et  al.  1281  studied  the  characteristics 
of  the  RPC  model  and  proposed  an  iterative  approximation 
solution  to  estimate  RPC  models  from  the  physical  sensor 
model  or  using  ground  control  points.  Hu  el  al.  ca  pro¬ 
posed  to  exploit  control  information  to  further  improve  the 
accuracy  of  the  solution  for  RPC  models.  To  further  boost 
accuracy,  Dial  et  al.  ED  leverage  bundle  adjustment  to  refine 
the  the  RPC  coefficients.  All  of  the  above  methods  provide 
numerical  approximations  to  the  RPCs  for  satellite  imagery. 
However,  numerical  stability  is  and  was  a  major  challenge 
in  these  solutions,  which  had  typically  been  mitigated  by 
leveraging  additional  information  such  as  sensor  character¬ 
istics  or  ground  control  points  f28ll9ll.  In  this  paper  we  pro¬ 
pose  a  solution  to  achieve  a  numerically  stable  and  accurate 
solution  to  determine  the  image-to-object-space  mapping 
given  an  RPC  model  describing  the  object-to-image-space 
transformation. 

Once  the  RPC  model  is  known,  computer  vision  and 
photogrammetry  applications  have  leveraged  them  in  a  va¬ 


riety  of  applications.  For  instance,  given  a  pair  of  match¬ 
ing  features  from  two  stereo  satellite  images,  Tao  et  al. 
129:1  reconstruct  the  corresponding  approximate  3D  point 
by  an  iterative  approach  that  relies  on  the  first-order  Tay¬ 
lor  approximation  of  the  RPC  model.  However,  lacking  a 
rigorous  analytical  convergence  proof  and  outlier  removal 
mechanisms,  the  automatic  processing  of  satellite  images  is 
impeded.  Several  other  approaches  also  provide  methods 
to  perform  dense  stereo  reconstruction  from  satellite  im¬ 
ages  mmmm.  Hirschmiiller  tia  used  an  approxi¬ 
mation  of  global  optimization  to  efficiently  solve  the  dense 
stereo  problem.  Oh  et  al.  f26ll  rectify  the  satellite  images 
by  approximating  the  epipolar  curve  with  piecewise  straight 
lines,  and  then  search  for  correspondences  along  the  line 
of  the  rectified  images  using  normalized  cross  correlation 
measures.  It  is  unclear  if  the  linear  approximations  work 
for  images  with  large  distortions.  Additionally,  this  method 
also  requires  that  the  inverse  RPC  parameters  are  available. 
Wang  et  al.  01  efficiently  compute  depthmaps  of  large 
satellite  images  by  testing  a  reduced  set  of  disparity  hy¬ 
potheses  for  each  pixel.  These  methods  are  very  effective 
when  handling  high-resolution  satellite  images,  but  a  bi¬ 
directional  mapping  is  required  for  such  methods  to  work. 
We  propose  a  novel  minimal  solver  to  accurately  establish 
this  bi-directional  mapping. 

3.  RPC  model 

There  are  two  broadly  used  classes  of  sensor  models,  the 
physical  sensor  model  and  the  generalized  sensor  model. 
The  physical  sensor  model  deduces  the  imaging  character¬ 
istics  from  the  physical  imaging  process  of  the  specific  sen¬ 
sor  used  for  acquisition.  The  parameters  describe  the  posi¬ 
tion  and  orientation  of  the  sensor  with  respect  to  an  object- 
space  coordinate  system.  These  physical  sensor  models  typ¬ 
ically  yield  high  modeling  accuracy  when  used  for  map¬ 
ping.  However,  the  physical  models  across  different  satel¬ 
lites  may  vary  significantly  and  do  not  provide  a  decent  gen¬ 
eralization.  Accordingly,  image  analysis  algorithms  often 
have  to  be  tailored  for  the  specific  satellite. 

To  overcome  this  limitation  of  specificity  and  to  achieve 
generalization  across  sensor  platforms  (satellites),  general¬ 
ized  sensor  models  were  proposed.  In  the  generalized  sen¬ 
sor  model,  the  transformation  between  the  image  and  the 
object  space  is  represented  as  a  general  mapping  function 
without  modeling  the  specific  physical  imaging  process  of 
the  satellite.  There  are  a  number  of  common  representa¬ 
tions  of  the  mapping  function,  for  instance  polynomials  or 
rational  functions.  In  either  of  these  models,  the  parameters 
do  not  carry  any  physical  meanings  related  to  the  imaging 
process  as  is  the  case  for  the  physical  sensor  models,  so  the 
algorithms  are  fixed  and  can  be  directly  apply  across  differ¬ 
ent  satellites.  Utilizing  the  RPC  model  to  replace  physical 
sensor  models  in  photogrammetric  mapping  is  becoming  a 
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standard  way  for  accurate,  economical,  and  fast  mapping. 
Next,  we  describe  the  RPC  model  in  detail. 

To  boost  accuracy,  the  RPC  model  typically  operates  in 
a  normalized  space.  The  normalization  functions,  for  the 
unnormalized  object  space  coordinates  A',  Y,  and  Z ,  are 
denoted  as  <I> A ,  <f>Y,  and  <PZ,  given  by: 

x  =  $x(X)  =  (X-X0)/Xs, 

Y  =  $y(Y)  =  (Y  -  Y0)/Y„ 

Z  =  $z(Z)  =  (Z-Z0)/ZS:  (1) 

with  X0,  Ya,  and  Z0  denoting  the  offset  of  object  coordi¬ 
nates,  and  Xs,  Ys  and  Zs  representing  the  corresponding 
scale  factors  for  the  coordinate  directions.  Similarly,  the 
image  coordinates  are  normalized  through: 

r  =  (r-r0)/ra,  c=(c-c0)/cs,  (2) 

with  r  denoting  the  normalized  row  coordinate  of  the  im¬ 
age  and  c  the  normalized  column  in  the  image,  and  r0 ,  c0 
being  the  offsets  of  the  image  coordinates.  The  scale  fac¬ 
tors  for  the  image  coordinates  are  denoted  by  rs,cs.  The 
normalization  parameters  are  set  to  makes  sure  that  the  3D 
region  covered  by  the  satellite  image  has  normalized  coor¬ 
dinates  within  the  cube  of  [— 1, 1]  x  [—1,1]  x  [—1,1],  and 
that  the  normalized  image  coordinates  are  within  the  range 
of  [—1, 1]  x  [—1,1].  There  parameters  are  provided  by  the 
satellite  image  vendors  with  the  RPC  model.  For  conve¬ 
nience,  we  denote  M  =  (X ,Y,Z),  M  =  (X,Y,Z),  and 
the  3D  point  normalization  as  M  = 

Then,  the  forward  RPC  model  relates  the  normalized  ob¬ 
ject  coordinates  A,  Y,  Z  to  the  normalized  image  coordi¬ 
nates  (r,  c): 

=  p1(x,y,z)  =  p3(x,y,z) 

r  P2  (X,  y,  z)  ’  c  p4(a,  y,  Z)  ’  1  ’ 

with  the  P\  i  =  1,2,  3,4  being  polynomials  of  degree 
three.  Also,  notice  that  in  Eq.  ([3]i  the  row  and  column  of 
the  image  coordinates  are  independently  computed.  The 
P\  i  =  1,  2,  3, 4  are  of  the  form  (superscript  i  is  omitted 
for  simplicity) 

P  =  clq  T  oqA  T  0,2 Y  T  o$Z  T  04.X Y  -f-  o^X  Z 
-p  a6y Z  -p  07X2  -p  agT  2  -p  ogZ2  -p  aioAk  Z 
+  an  A  2y  +  ai2A2Z  +  o13XY2  +  auY2Z 
+  ai5AA2  -p  cliq\  Z2  -p  aizA3  -p  ai§y3  -p  a±gZ3, 

(4) 

where  a:j,  j  =  0, . . . ,  19  are  the  polynomial  coefficients. 
It  is  understood  that  distortions  caused  by  the  optical  pro¬ 
jection  during  the  satellite  image  capture  can  generally  be 
represented  by  the  ratios  of  first-order  terms  of  the  ln .  Cor¬ 
rections  such  for  the  earth  curvature,  the  atmospheric  re¬ 
fraction,  etc.,  can  be  well  approximated  by  the  second-order 


terms.  Additional  unmodeled  distortions  are  approximated 
through  the  high-order  components,  where  one  example  of 
these  distortions  is  the  camera  vibration  that  can  be  mod¬ 
eled  by  the  third-order  terms  |[28l.  Typically,  the  low-order 
monomial  terms  in  Eq.  Q  have  a  significantly  larger  coef¬ 
ficients  than  high-order  monomials. 

The  inverse  RPC  model  performs  image-to- space  trans¬ 
formation.  Given  the  input  of  the  normalized  row  and  col¬ 
umn  (r,  c),  and  the  normalized  altitude  Z  of  the  correspond¬ 
ing  3D  point,  it  computes  associated  X  and  Y.  The  model 
is  almost  the  same  to  that  of  forward  RPC  model  as  follows, 

P5(r,c,Z)  P7(r,c,A) 

P6(r,c,y)’  P8{r,c,Z)’ 

where  Pl,  i  =  5,  6,  7, 8  has  the  form  of  polynomial  as  in 
Eq.  <|4]).  While  some  satellite  image  vendors  deliver  both  the 
forward  and  the  inverse  RPC  model  with  their  data,  most  of 
them  only  provide  the  forward  RPC  model.  We  also  find  es¬ 
timating  the  inverse  RPC  model  is  difficult  without  detailed 
knowledge  of  the  sensor  and/or  ground  control  points.  In 
the  remainder  of  this  paper,  we  use  the  term  RPC  model  to 
represent  forward  RPC  model  since  we  assume  the  inverse 
RPC  is  unknown. 

4.  Inverse  mapping 

This  section  proposes  the  parameterizations  of  the  in¬ 
verse  mapping,  a  minimal  solver  based  on  a  polynomial 
equation  system,  and  our  proposed  solution  using  the 
Grobner  basis  method.  We  close  the  discussion  with  the 
application  of  the  novel  minimal  solver  to  stereo  estimation. 

4.1.  Inverse  Mapping  Parameterization 

Unlike  traditional  methods  that  explicitly  model  the  in¬ 
verse  mapping  and  estimate  its  parameters,  our  method 
maps  each  individual  point  without  any  assumption  about 
the  model  of  the  inverse  mapping  (inverse  RPC). 

To  compute  the  inverse  mapping,  we  rewrite  the  RPC 
mapping  from  Equation  Q  as: 

p1  (x,  y,  z)  -  rP2{x,  y,  z)  =  0 
p3  (a,  y,  z)  -  cpA  (a,  y,  z)  =  0  w 

which  defines  a  polynomial  equation  system.  Assuming  a 
known  normalized  altitude  Z  and  after  expanding  each  of 
Pl  in  Eq.  (Jbj),  we  can  obtain  a  polynomial  equation  system 
of  two  equations.  Each  has  degree  three  in  the  unknown 
variables  X  and  Y.  Using  computational  algebra  software 
(e.g.  Maple  or  Macaulay2  03)  it  can  be  confirmed  that  the 
equation  system  of  Eq.  |6])  has  up  to  nine  solutions.  Next, 
we  describe  the  Grobner  bases  method  for  our  solution. 

4.2.  Solving  polynomial  equations 

In  order  to  solve  the  polynomial  equation  system  of 
Eq.  §.,  we  leverage  the  Grobner  basis  method  which  has 


740 


1 


1 
3 
5 
7 
9 
11 

1  3  5  7  9  11  13  15  17  19  21 

(a)  Elimination  template  of  size  12  X  20 
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(b)  Elimination  template  in  Echelon  form 


(c)  Action  matrix 


Figure  2:  The  Grobner  basis  method  for  inverse  mapping  problem.  The  black  dots  represent  non-zero  values.  The  value  of 
the  three  rows  in  the  action  matrix  is  obtained  from  the  three  rows  in  figure  (b). 


achieved  great  success  in  solving  minimal  problems  en¬ 
countered  in  computer  vision  [27;  1231.  From  the  existing 
standard  approaches  01  [22j  [27l  [3),  we  opt  to  use  the  effi¬ 
cient  and  accurate  method  proposed  by  Kukelova  et  al.  m 
as  we  empirically  found  it  to  produce  very  accurate  results. 

Solving  a  polynomial  equation  system  typically  involves 
two  steps.  The  first  step  is  an  offline  process  in  which  the 
specific  behavior  of  the  problem  is  studied,  and  a  pattern 
for  the  solution  is  determined.  The  second  step,  an  online 
process,  uses  the  determined  pattern  to  set  up  and  solve  the 
problem  for  a  given  input. 

Starting  with  the  offline  process,  it  consists  of  two  main 
parts.  In  the  first  part,  the  problem  (Eq.  ([6]))  is  mod¬ 
eled  in  a  computer  algebra  software  package  (such  as 
Macaulay2  DU).  We  then  leverage  the  built-in  functions 
of  the  software  package  to  first  compute  the  Grobner  bases, 
and  then  the  bases  of  the  quotient  ring  (which  in  turn  de¬ 
fines  the  number  of  solutions  to  our  problem).  During  this 
computation,  in  order  to  avoid  issues  of  numerical  preci¬ 
sion,  the  problem  is  represented  using  a  finite  field  (which 
uses  fixed,  discrete  values  instead  of  a  floating-point  rep¬ 
resentation).  For  these  computations,  we  use  the  GrevLex 
as  the  monomial  ordering,  as  this  typically  results  in  fewer 
overall  computations  compared  to  other  alternative  ordering 
schemes  (such  as  lexicographic)  (71 . 


The  second  part  of  the  offline  process  seeks  to  construct 
a  modified  set  of  polynomial  equations  that  can  be  used  for 
action  matrix  construction.  In  order  to  achieve  this,  we  first 
construct  an  elimination  template  (e.g.  Fig.  2a  I  which  rep¬ 
resents  our  initial  equations  (the  first  two  rows  of  the  tem¬ 
plate  correspond  to  Eq.  ([6])).  Here,  the  polynomial  equa¬ 
tions  are  represented  as  MX  =  0,  where  X  is  a  vector 
of  monomial  terms,  and  M  is  a  matrix  of  polynomial  co¬ 
efficients  (the  elimination  template  |f3|).  Given  these  ini¬ 
tial  equations,  we  then  multiply  them  by  various  monomial 
terms,  yielding  additional  entered  rows  in  the  elimination 
template.  The  purpose  of  these  additional  entries  is  to  en¬ 


able  the  successful  construction  of  an  action  matrix  in  a  fol¬ 
lowing  step.  Given  the  elimination  template,  we  then  per¬ 
form  Gauss-Jordan  elimination  to  reduce  the  matrix  to  ech¬ 
elon  form.  Once  in  echelon  form,  the  matrix  has  certain 
properties  which  enable  the  construction  of  an  action  ma¬ 
trix  by  leveraging  the  bases  of  the  quotient  ring  computed 
above.  The  exact  method,  properties,  and  theory  for  how 
to  construct  the  elimination  template  and  action  matrix  are 
complex  and  beyond  the  scope  of  this  paper.  Therefore,  we 
refer  readers  to  01221  ED  for  more  details. 

By  successfully  constructing  the  action  matrix,  we  now 
have  a  pattern  for  how  the  original  set  of  equations  can 
be  solved.  The  online  portion  of  the  processing  now  uti¬ 
lizes  actual  input  values  to  form  the  elimination  template, 
perform  Gauss-Jordan  elimination,  and  construct  the  action 
matrix.  Then,  the  final  numerical  solutions  to  the  polyno¬ 
mial  equations  can  be  computed  from  the  eigenvectors  of 
the  action  matrix.  As  much  of  this  process  has  been  de¬ 
tailed  before  0122,1 1 23 1,  we  now  only  provide  the  details  of 
the  offline  procedures  for  our  specific  problem. 

For  our  inverse  mapping  problem,  the  polynomial 
equation  system  consists  of  two  polynomials,  with  each 
containing  10  monomials  (Eq.  <©»•  The  polynomial 
equations  are  multiplied  by  the  following  monomials 
{X,  Y,  X2,  XY,  Y 2 } ,  and  the  resulting  equations  are  added 
to  form  the  elimination  template  (Fig.  [2aJ).  Then,  the  eche¬ 
lon  form  of  the  elimination  template  is  computed  by  Gauss- 
Jordan  elimination  (Fig.  [2bji,  and  the  action  matrix  is  con¬ 
structed.  Here,  the  negated  values  of  row  6,  10  and  12  in 
Fig.  [2b]  are  used  for  action  matrix  construction  as  shown  in 
Fig. pel  After  constructing  the  action  matrix,  the  7th  and  8th 
elements  of  the  eigenvectors  are  the  solutions  of  the  poly¬ 
nomial  equation  system. 

4.3.  Inverse  mapping  in  stereo 

After  describing  how  to  solve  for  the  inverse  mapping, 
we  next  detail  its  usage  within  the  context  of  stereo  estima- 
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tion  for  a  pair  of  stereo  images  I\  and  I2.  For  convenience 
of  notation,  we  use  the  subscript  1  and  2  to  denote  the  asso¬ 
ciated  properties  for  I\  and  1 2. 

In  traditional  image  based  stereo  for  pinhole  cameras, 
it  is  known  that  points  corresponding  to  point  p\  in  the 
first  image  l\  can  only  be  found  along  the  corresponding 
epipolar  line  hipi )  in  the  second  image  I2.  The  epipo- 
lar  line  hipi )  effectively  represents  the  depth  ambiguity 
of  the  point  p\  in  the  second  image  I2,  i.e.  the  position 
along  the  line  corresponds  to  the  depth  of  the  point.  Hav¬ 
ing  this  reduction  to  a  one-dimensional  search  space  (epipo¬ 
lar  line)  significantly  reduces  the  computational  expense  of 
the  stereo  estimation.  Leveraging  the  inverse  mapping  in 
combination  with  the  RPC  model,  we  are  also  able  to  es¬ 
tablish  a  one  dimensional  search  space  for  image-to-image 
stereo  from  satellite  images.  In  contrast  to  the  pinhole  cam¬ 
era  model  that  has  an  epipolar  line,  the  satellite  images  have 
epipolar  curves  with  unknown  analytical  formulation  lf26]|. 
However,  using  the  inverse  RPC  mapping  we  can  still  nu¬ 
merically  identify  the  epipolar  curve  in  the  second  image. 

Starting  from  an  image  point  (rx ,  C\ )  in  pixel  coordi¬ 
nates,  we  first  apply  the  image  point  normalization  of  Eq. 
(|2]»  to  obtain  the  normalized  point  (n,  c\)  in  the  first  satel¬ 
lite  image.  To  explore  the  ambiguity  space  for  the  normal¬ 
ized  altitude  Z\  of  the  corresponding  object  point,  we  can 
sample  the  normalized  altitude  in  the  range  [—1, 1]  (Sec. [5]). 
For  each  specific  sample  Z\,  the  inverse  mapping  can  be  ap¬ 
plied  to  compute  the  corresponding  normalized  3D  object 
point  M\  =  (Xj ,  Tj\  Z\).  The  set  of  samples  {Z{}  defines 
a  curve  in  the  object  space  corresponding  to  the  depth  ambi¬ 
guity  of  the  image  point  (?i ,  C\ ) .  After  de-normalizing  from 


the  normalized  3D  points  M{  to  real  object  space,  apply  the 
normalization  of  the  second  image  and  the  RPC  model  of 
I2  to  compute  the  2D  point  (r%.  c{)  on  72.  In  practice  we 
adjust  the  sampling  rate  to  produce  a  sub-pixel  sampling  in 
the  second  satellite  image.  Fig.  [3a]  shows  a  2D  point  and 
its  corresponding  epipolar  curve  for  a  pair  of  stereo  satellite 
images.  The  particular  epipolar  curve  in  Fig.  3b  is  close  to  a 
line,  but  epipolar  curves  generally  show  different  curvatures 
in  different  parts  of  the  image.  As  in  the  case  of  standard 
stereo  estimation,  the  limited  altitude  range  further  short¬ 
ens  the  epipolar  curve  in  the  second  image.  This  leads  to  a 
further  reduced  search  space  in  the  second  satellite  image. 


5.  Triangulation 

We  now  describe  our  method  for  robust  satellite  sparse 
point  triangulation,  which  allows  us  to  efficiently  remove 
outlier  correspondences  during  feature  triangulation.  We 
address  the  problem  of  obtaining  the  normalized  3D  point 
from  a  pair  of  correspondences  with  normalized  coordinates 
(n,ci)  in  image  Ii  and  (r2,C2)  in  image  I2  with  known 
RPC  models.  Our  key  observation  is  that  it  is  sufficient  to 
use  three  out  of  the  four  coordinate  values  to  triangulate  the 


(a)  (b) 


Figure  3:  The  yellow  point  on  the  left  image  and  its  corre¬ 
sponding  blue  epipolar  curve  on  the  right  image.  The  white 
rectangle  is  the  region  of  feature  candidates  for  matching. 


3D  point.  Hence,  the  last  coordinate  value  can  be  used  for 
correspondence  verification.  Without  loss  of  generality,  we 
use  n,  ci  and  r2  for  triangulation,  and  C2  for  verification. 

For  a  3D  point  ( X ,  Y,  Z)  in  the  real  object  space  that  is 
observed  by  I\  and  1 2,  and  given  the  known  RPC  models 
(Eq.  <[3]»)  for  images  I\  and  /2,  we  can  construct  the  follow¬ 
ing  polynomial  equation  system 

r  - nAW =  0 

{  p1w^r^f)-ciP1w^r^f)=o  (7) 

(  )  -  r2P| ($2Y,  $f)  =  0: 


where  $  <1)V  and  (\>z  are  the  known  normalization  func¬ 

tions  for  the  point  in  the  object  space  given  by  Eq.  0- 
Eq.  (7])  defines  a  system  of  three  polynomial  equations,  and 
each  of  the  polynomial  has  degree  3  in  the  unknown  vari¬ 
ables  X,  Y  and  Z.  Similarly  to  the  inverse  mapping,  Eq.  j7]l 
has  in  general  up  to  27  solutions,  some  of  which  have  real 
values.  Solving  Eq.  (j7]l  again  uses  the  framework  of  the 
Grobner  basis  as  briefly  described  below. 

Solving  the  polynomial  equation  system  for  triangula¬ 
tion  is  similar  to  that  of  inverse  mapping,  but  slightly  more 
complicated.  The  original  3  polynomials  are  multiplied 
by  all  the  34  monomials  in  variables  {X,Y,Z}  of  de¬ 
grees  from  1  up  to  4,  and  added  to  the  original  3  equa¬ 
tions.  The  number  of  total  equations  is  35  *  3  =  105,  but 
within  which  only  93  are  linearly  independent.  By  choos¬ 
ing  93  linearly  independent  equations,  we  have  the  elim¬ 
ination  template  of  size  93  x  120  (Fig.  |4aJ).  After  com¬ 
puting  the  echelon  form  (Fig.  [4hji.  the  negated  values  in 
the  submatrix  from  the  rows  {36,  62,  63,  78,  79,  80,  89}  and 
columns  {64, 83 — 85,  95 — 100, 104 — 120}  are  used  in  row 
{1 — 3,  5 — 7, 11}  of  the  action  matrix  (Fig.  4c  1.  The  24th, 
25th,  and  26th  elements  of  the  eigenvectors  of  the  action 
matrix  are  solutions  to  the  polynomial  equation  system. 

After  estimating  the  3D  object  point  M* ,  we  next  need 
to  evaluate  the  obtained  solution  leveraging  the  remaining 
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(a)  Elimination  template  of  size  93  x  120 


(b)  Elimination  template  in  Echelon  form 


(c)  Action  matrix 


Figure  4:  The  Grobner  basis  method  for  triangulation  problem.  The  eliminate  template  shapes  may  vary  depending  on  which 
linearly  independent  equations  are  chosen,  but  the  shape  of  the  echelon  form  is  fixed. 


point  coordinate  c2 .  During  this,  c2  acts  as  a  role  to  remove 
the  outlier  correspondences  and  the  fake  real  solutions  of 
the  polynomial  equation  system.  The  estimated  3D  point 
M*  =  (X*  ,Y* ,  Z*)  is  projected  onto  I2  to  compute  the 
column  62*.  If  the  distance  d*  =  |c2  —  cj  |  is  below  a  prede¬ 
fined  threshold  (e.g.  2  pixels  in  our  experiments),  the  cor¬ 
respondence  is  considered  to  be  correct.  Since  Eq.  0  gen¬ 
erates  multiple  real  valued  solutions,  each  real  valued  solu¬ 
tion  is  tested  for  correctness.  If  none  of  the  solutions  pass 
the  test,  the  2D  match  is  considered  to  be  an  outlier.  For 
correct  matches  (inlier),  typically  only  one  solution  passes 
the  inlier  test.  Whenever  multiple  solutions  pass  the  test,  we 
accept  the  one  with  the  smallest  distance  d*. 

Beyond  serving  for  validation,  the  redundancy  during  the 
triangulation  can  also  be  used  to  further  refine  the  estimated 
3D  point  M*  starting  from  the  obtained  solution.  We  pro¬ 
pose  to  determine  the  refined  3D  point  by  minimizing  the 
projection  error  in  both  images  through 
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where  <f \  are  the  normalization  functions.  We  leverage  the 
Levenberg-Marquardt  algorithm  to  perform  the  minimiza¬ 
tion  of  the  non-linear  cost  function. 

Triangulation  requires  estimating  the  correspondences 
from  image  features  in  two  images.  Since  the  satellite  im¬ 
ages  are  typically  huge  and  contain  millions  of  features,  it  is 
inappropriate  to  exhaustively  compute  the  pairwise  distance 
between  every  two  features.  Not  only  this  is  computation¬ 
ally  expensive,  but  also  the  feature  descriptor  is  less  distinc¬ 
tive  in  presence  of  millions  of  features.  To  tackle  this  prob¬ 
lem,  we  use  the  observation  that  the  epipolar  curve  typically 
covers  a  small  region  of  the  satellite  image.  For  the  feature 
point  shown  in  Fig.  [3aJ  we  only  search  within  the  white 


rectangle  for  correspondence  (Fig.  3b  I.  The  white  rectangle 
is  determined  by  the  two  end  points  of  the  epipolar  curve, 
which  is  readily  computed  using  the  inverse  mapping. 


6.  Experiments 

To  evaluate  inverse  mapping  and  triangulation,  we  use 
both  synthetic  data  and  real  data  for  quantitative  and  quali¬ 
tative  evaluations.  Both  solvers  are  implemented  using  C++ 
with  double-precision  arithmetic.  To  evaluate  the  numerical 
errors,  we  also  implement  the  triangulation  method  with  64- 
bit  significand  (mantissa)  as  opposed  to  53-bit  for  double 
precision.  The  running  time  on  Intel  Xeon  E5-2597  @2.70 
GHz  for  each  of  the  implementations  are  0.064ms,  0.9ms 
and  164ms,  respectively.  Accordingly,  the  efficiency  of  our 
implementations  with  double  arithmetic  is  sufficient  to  en¬ 
able  diverse  application  scenarios. 

6.1.  Inverse  mapping 

To  quantitatively  evaluate  our  proposed  inverse  RPC 
mapping  estimation,  we  generate  synthetic  3D  points  from 
a  random  uniform  distribution  within  the  cube  of  [— 1  ,1]  x 
[-1,1]  x  [-1,1].  In  order  to  yield  the  image  coordinates, 
these  3D  points  are  projected  into  virtual  images  I\  and 
I2  using  their  known  RPCs  to  produce  normalized  image 
coordinates.  To  approximate  realistic  experimentation  sce¬ 
narios,  we  use  four  sets  of  real  RPC  model  parameters  ex¬ 
tracted  from  two  different  satellites,  including  GeoEye-1 
and  Worldview-2.  Each  test  is  performed  on  10, 000  ran¬ 
dom  samples  for  each  set  of  RPC  parameters.  Error  charac¬ 
teristics  were  stable  across  the  four  different  sets  of  RPCs. 
Hence,  quantitative  results  describe  the  performance  over 
the  aggregated  sample  of  40,000  tests. 

Accuracy.  Our  proposed  inverse  mapping  computes  the 
normalized  latitude  and  longitude  of  the  3D  object  point. 
The  estimated  normalized  coordinates  (X* ,Y*)  are  com- 
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Figure  5:  Our  inverse  mapping  has  7.8  •  10  14  median  error. 
LS  in  Fig.[5a|represents  the  method  proposed  by  OTI. 


(a)  Our  inverse  mapping  (b)  The  method  of  ED 


Figure  6:  The  error  distribution  of  inverse  mapping  across 
the  whole  image. 


pared  with  the  ground  truth  using  the  Euclidian  distance 
||(X*,y*)  —  (X9  ,Y9)\\2,  where  the  superscript  g  denotes 
the  ground  truth.  The  resulting  error  distribution  is  shown 
in  Fig.  [5a|  As  can  be  seen  from  Fig.  [5a]  our  proposed  algo¬ 
rithm  typically  has  errors  of  less  than  10~  ,  which  is  well 
below  the  demands  of  common  applications. 

To  compare  with  one  of  the  state-of-the-art  methods 
in  estimating  the  inverse  mapping,  we  implemented  the 
method  of  Yang  ffTTl.  Their  method  uses  the  inverse  RPC 
model  in  Eq.  <0  and  given  the  ground  truth  data  between 
3D  and  2D  correspondences,  the  parameters  of  the  model  is 
estimated  by  fitting  the  data  through  least  squares  estima¬ 
tion.  To  this  end,  we  used  ground  truth  synthetic  data  for 
parameter  estimation.  We  note  that  estimation  errors  above 
a  magnitude  of  10-4  correspond  to  an  error  of  one  meter  in 
the  real  object  space  for  a  typical  satellite  image  that  cov¬ 


ers  10  kilometers  in  one  dimension.  Accordingly,  Fig.  5a 


illustrates  how  our  proposed  algorithm  significantly  outper¬ 
forms  the  method  in  OTI  (LS). 

Furthermore,  we  apply  another  error  measure  that  intu¬ 
itively  shows  the  superiority  of  our  method  over  ED-  A 
2D  image  point  with  an  altitude  is  inversely  mapped  into 
the  object  space  and  then  projected  back  onto  the  same  im¬ 
age  to  generate  a  new  2D  point.  The  error  is  defined  as  the 
Euclidean  distance  in  pixels  between  the  two  points,  which 
ideally  should  equal  to  zero.  For  each  pixel  across  the  whole 
image,  we  try  100  normalized  altitudes  randomly  sampled 
within  the  valid  range  [—1,1],  and  report  the  average  error 
over  the  100  trials.  Fig.[6]shows  the  error  distribution  across 
a  image  of  resolution  7168  x  9216.  We  can  see  our  method 
has  low  and  uniform  errors  across  the  whole  image,  while 
the  method  by  EH  generates  prominent  and  structured  er¬ 
rors  (larger  than  2  pixels  for  some  regions). 

Solution  Cardinality.  Depending  on  RPC  model  pa¬ 
rameters  and  the  values  of  the  specific  estimation  instance, 
we  observed  in  our  experiments  either  one  or  three  real  val¬ 
ued  solutions.  Fig.[5b]illustrates  the  distribution  of  the  num¬ 
ber  of  real  valued  solutions.  It  can  be  seen  that  our  attained 
solutions  are  predominantly  unique.  Moreover,  there  is  al¬ 
ways  only  one  solution  with  both  of  the  estimated  values 


X*  and  Y*  within  the  valid  range  of  [—1, 1].  This  behavior 
follows  from  the  geometric  intuition  that  two  3D  points  with 
the  same  altitude  in  the  real  world  should  not  be  projected 
onto  the  same  pixel. 


6.2.  Triangulation 


Accuracy.  We  use  the  coordinate-wise  components 
{ r ! ,  C! ,  r 2  }  of  a  match  positioning  within  the  satellite  im¬ 
agery  to  estimate  the  3D  point  by  the  triangulation  solver, 
and  project  onto  I2  to  compute  the  estimated  column,  de¬ 
noted  as  Sjh  The  projection  error  is  defined  as  |c2  —  c]jj|. 
The  triangulation  error  is  defined  as  the  Euclidian  distance 
between  the  refined  3D  point  (Eq.  |8]l)  and  the  ground  truth. 
The  error  distributions  in  Figs.  7a  and  7b  show  both  of  the 


projection  errors  and  triangulation  errors  are  very  small. 
Moreover,  we  find  that  the  accuracy  of  triangulation  to  be 
inferior  to  that  of  our  estimated  inverse  mapping.  We  at¬ 
tribute  this  behavior  to  the  numerical  errors  from  the  G-J 
elimination  of  the  much  larger  elimination  template.  To 
validate  this  assumption,  we  also  implement  the  solver  in 
high-precision  floating  point  arithmetic  with  64  bits  of  sig- 
nificand  (mantissa),  in  addition  to  the  double  precision  im¬ 
plementation  (53-bit  mantissa).  The  results  in  Figs.  [7a]  and 
7b  support  numerical  precision  to  be  a  significant  compo¬ 
nent  of  the  estimation  error  associated  with  our  framework. 


Stability.  In  practice  our  estimation  process  is  faced 
both  with  measurement  errors  and/or  potential  mismatches 
in  the  2D  correspondences.  To  test  the  stability  against 
measurement  noise,  we  additively  perturb  input  coordinate 
measurements  with  zero-mean  Gaussian  noise  with  differ¬ 
ent  standard  deviations.  Fig.  [7c] shows  the  projection  error 
across  different  noise  levels.  From  Fig.  [7c]  it  can  be  seen  if 
the  Gaussian  noise  has  deviation  of  1  pixel,  the  90th  per¬ 
centile  of  the  projection  error  is  around  1.8  pixels.  The 
projection  error  is  used  for  outlier  correspondences  rejec¬ 
tion,  so  we  can  optionally  choose  2  pixels  as  the  threshold 
for  inlier  testing.  We  note  that  the  estimated  triangulation 
error  has  real  scale.  Moreover,  for  the  RPC  parameters  be¬ 
ing  used,  each  pixel  corresponds  to  0.5  meter  in  the  object 
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Figure  7:  (a,  b)  Errors  on  noise-free  data,  (c,  d)  Median  and  90th  percentile  errors  on  noisy  data,  (e)  Solution  cardinality. 


(a)  Satellite  image  (b)  Triangulated  point  cloud  (c)  Textureless  3D  mesh  model 

Figure  8:  Triangulation  results  on  the  real  satellite  images.  The  images  have  a  pixel  resolution  of  20460  x  20460. 


space.  Given  that,  the  triangulation  error  shown  in  Fig.[7d| 
have  reasonably  small  errors.  We  also  test  the  implementa¬ 
tion  with  64-bit  significand  in  presence  of  noise,  but  it  has 
little  benefit  in  errors,  which  means  the  measurement  noise 
dominates  the  errors.  Considering  the  efficiency  vs.  accu¬ 
racy  trade-off,  the  implementation  with  double  precision 
should  be  used  in  most  real  applications.  The  number  of 
real  solutions  is  shown  in  Fig.  [7e]  There  are  typically  mul¬ 
tiple  real  solutions  within  the  valid  range,  but  only  one  of 
them  has  small  projection  error. 


Real  images.  Although  the  synthetic  data  illustrates  the 
correctness  and  effectiveness  of  our  estimation  methods,  we 
also  test  the  algorithm  on  a  pair  of  real  satellite  images  cap¬ 
tured  by  the  Worldview-2  satellite.  To  compute  the  2D  Cor¬ 
respondences,  BRIEF  @  is  used  as  local  feature  descriptor, 
and  pairwise  Hamming  distance  is  used  to  measure  feature 
similarity.  One  of  the  images  used  for  stereo  triangulation 
and  the  associated  3D  point  cloud  are  shown  in  Fig.[8a|and 
Fig.  [8b]  We  generate  3D  mesh  using  Poisson  surface  recon¬ 
struction  EQ,  where  the  point  normal  information  required 


by  Poisson  surface  reconstruction  is  set  to  (0,  0, 1).  Fig.  8c 
shows  the  textureless  mesh  model. 


7.  Conclusion 


We  propose  two  novel  minimal  solvers  for  satellite  im¬ 
agery  based  computer  vision.  The  first  minimal  solver  tar¬ 
gets  image  based  stereo  estimation  by  establishing  an  ac¬ 
curate  inverse  mapping  from  image  space  to  object  space. 
Our  evaluations  demonstrate  the  advantage  of  the  proposed 
solver  over  existing  methods.  The  second  minimal  solver 
aims  at  3D  triangulation  from  satellite  images.  Based  on 
this  solver,  we  are  able  to  propose  a  robust  estimator  and  a 
refinement  for  increased  accuracy  of  the  triangulation. 
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